Geospatial Stream Query Processing using Microsoft SQL Server StreamInsight
Abstract
Microsoft SQL Server spatial libraries contain several components that handle geometrical and geographical data types. With advances in geo-sensing technologies, there has been an increasing demand for geospatial streaming applications. Microsoft SQL Server StreamInsight (StreamInsight, for brevity) is a platform for developing and deploying streaming applications that run continuous queries over high-rate streaming events. With its extensibility infrastructure, StreamInsight enables developers to integrate their domain expertise within the query pipeline in the form of user defined modules. This demo utilizes the extensibility infrastructure in Microsoft StreamInsight to leverage its continuous query processing capabilities in two directions. The first direction integrates SQL spatial libraries into the continuous query pipeline of StreamInsight. StreamInsight provides a well-defined temporal model over incoming events while SQL spatial libraries cover the spatial properties of events to deliver a solution for spatiotemporal stream query processing. The second direction extends the system with an analytical refinement and prediction layer. This layer analyzes historical data that has been accumulated and summarized over the years to refine, smooth and adjust the current query output as well as predict the output in the near future. The demo scenario is based on transportation data in Los Angeles County.
Introduction
geostreaming, StreamLight, Spatial Libraries
Challenges
port geospatial libraries to the streaming domain with the incremental single-pass processing model in mind.
make decision based on the continuous query result.
Contributions
- integrate Microsoft SQL Server Spatial Libraries into Microsoft StreamInsight to support the online processing of geo-stream data. Special attention is given to incrementally evaluate spatial operations.
- Implement an online analytical refinement and prediction layer that enables querying of historical (archived) stream data.
Architecture
StreamInsight
- Input Adapter: continuously listens on a TCP port for the streaming data. Once a packet of data is retrieved, the corresponding events are created and enqueued to the streaming engine.
- Streaming Engine: Runs pre-defined queries on input events, Also did query fusing, operator sharing, and query and stream partitioning. StreamInsight provides user-defined aggregate (UDA), user- defined operator (UDO), and user-defined function (UDF) facilities.
Output Adapter
Online Analytical Refinement and Prediction Layer
- Use PCA to implement a historical data sketches.
- more correlated is the streaming data, the less number of components are needed to create accurate
. Given efficient access to historical data through UDAs, one can accomplish the following refinement and prediction functions.
Refinement functions
- Substituting missing streaming data
- Smoothing noisy input data
Detection of anomalies
Prediction functions
- Predicting near future trends
Responding to anomalies
Spatial Cartridge
- For stream join relational table
- Spatial cartridge extends the capabilities of StreamInsight
- Determining the way StreamInsight retrieves and interprets the underlying spatial information (e.g., road networks) by customizing the spatial operators and idexes for efficient access to large spatial data. This way users can easily implement the functions or interfaces that have the specialized behavior required in the geostreaming applications.